Search CORE

139 research outputs found

Jointly creating digital abstracts: dealing with synonymy and polysemy

Author: A Ceol
AR Pico
B Mons
C Blaschke
CN Arighi
F Leitner
F Leitner
H Pearson
M Krallinger
M Seringhaus
Martin Kuiper
NE Fuchs
P Jaiswal
RG Côté
S Vercruysse
S Vercruysse
Steven Vercruysse
T Kelder
T Kuhn
TA Eyre
WA Baumgartner Jr
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

The structural and content aspects of abstracts versus bodies of full text journal articles are different

Author: Alias-i
B Settles
BM Szmrecsányi
C Blaschke
C Friedman
C Gasperin
C Gasperin
Christophe Roeder
D Jurafsky
D Klein
DP Corney
G Leroy
Helen L Johnson
I Goldin
J Lin
JG Caporaso
K Bretonnel Cohen
K Verspoor
Karin Verspoor
L Hirschman
L Tanabe
Lawrence E Hunter
M Krallinger
N Elhadad
PG Mutalik
PI Nakov
R Leaman
S Abney
S Agarwal
T McIntosh
W Chapman
W Chapman
W Hersh
WA Baumgartner Jr
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background An increase in work on the full text of journal articles and the growth of PubMedCentral have the opportunity to create a major paradigm shift in how biomedical text mining is done. However, until now there has been no comprehensive characterization of how the bodies of full text journal articles differ from the abstracts that until now have been the subject of most biomedical text mining research. Results We examined the structural and linguistic aspects of abstracts and bodies of full text articles, the performance of text mining tools on both, and the distribution of a variety of semantic classes of named entities between them. We found marked structural differences, with longer sentences in the article bodies and much heavier use of parenthesized material in the bodies than in the abstracts. We found content differences with respect to linguistic features. Three out of four of the linguistic features that we examined were statistically significantly differently distributed between the two genres. We also found content differences with respect to the distribution of semantic features. There were significantly different densities per thousand words for three out of four semantic classes, and clear differences in the extent to which they appeared in the two genres. With respect to the performance of text mining tools, we found that a mutation finder performed equally well in both genres, but that a wide variety of gene mention systems performed much worse on article bodies than they did on abstracts. POS tagging was also more accurate in abstracts than in article bodies. Conclusions Aspects of structure and content differ markedly between article abstracts and article bodies. A number of these differences may pose problems as the text mining field moves more into the area of processing full-text articles. However, these differences also present a number of opportunities for the extraction of data types, particularly that found in parenthesized text, that is present in article bodies but not in article abstracts.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Concept recognition for extracting protein interaction relations from biomedical text

Crossref

Springer - Publisher Connector

PubMed Central

OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression

Abstract Background Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering. Results OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. Conclusion OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at <url>http://bionlp.sourceforge.net/</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Biomedical Discovery Acceleration, with Applications to Craniofacial Development

Author: A Amano
A Baumeister
A Cvekl
A Ferrer-Martinez
A Gabow
A Gavalas
A Hollnagel
A Jaimovich
A Karimpour-Fard
A Karimpour-Fard
A Karimpour-Fard
A Karimpour-Fard
A L'Honore
A Nakaya
A Nazarali
A Subramanian
A Visel
A Yamane
A Zanzoni
AK Ramani
AM Edwards
AY Sivachenko
B Kanzler
BJ Daigle Jr
BT Alako
C Faloutsos
C North
C von Mering
CH Yeang
CL Myers
CL Myers
CM Deane
D Barker
D Eisenberg
D Hanisch
D Hwang
DJ Reiss
DP Hill
DP Tan
DR Rhodes
DS Goldberg
E Nabieva
E Segal
E Sprinzak
E Wingender
EM Marcotte
F Cozman
F Sohler
FM Rijli
GD Bader
GD Bader
GR Lanckriet
H Hishigaki
H Ogata
H Suzuki
H Tipney
Hannah Tipney
HJ Drabkin
HY Chuang
I Iossifov
I Lee
I Xenarios
J Chen
J Cui
J Graw
J Kim
J Kim
J Li
J Sun
JP Vert
JR Barrow
JS Bader
JT Eppig
L Hedges
L Hunter
L Hunter
L Li
L Salwinski
Lawrence Hunter
M Ashburner
M Bada
M Donalies
M Downes
M Downes
M Gendron-Maguire
M Kanai-Azuma
M Kanehisa
M Krallinger
M Maconochie
MC Mikl
MP Smidt
MS Scott
MY Galperin
N Daraselia
N Nariai
OG Troyanskaya
P Dupont
P Hunt
P Lipton
P Pei
P Saraiya
P Shannon
PA Gray
PM Bowers
Priyanka Kasliwal
PW Lord
R Bellazzi
R Hoffman
R Jansen
R Saito
Richard A. Spritz
Ronald P. Schuyler
S Asthana
S Brewer
S Draghici
S Imoto
S Kerrien
S Leach
S Leach
Satoru Miyano
Sonia M. Leach
T Ideker
T Matsumoto
T Schlitt
Trevor Williams
V Ferretti
W Feng
W Feng
WA Baumgartner
WA Baumgartner Jr
Weiguo Feng
William A. Baumgartner
X Yang
Y Chen
Y Kamei
Y Nakayama
Y Yamanishi
Y Yamanishi
Publication venue: Public Library of Science
Publication date: 01/03/2009
Field of study

The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Factors associated with mobility of the oldest old

Author: Altmets K
Baker DW
Bannermann E
Barbosa AR
Barbosa AR
Baumgartner RN
Bouchard DR
Cipriani NCS
Freitas Jr IF
Gregory PC
Guralnik JM
Hung WW
Klijs B
Miszkurka M
Nilsson CJ
Nogueira SL
Parahyba MI
Petersen KL
Sainio P
Sallinen J
Satariano WA
Stenholm S
Sudore RL
Thorpe Jr RJ
Troiano RP
Whitson HE
Williams DR
Winter CC
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

Text Mining Improves Prediction of Protein Functional Sites

Author: A Koussounadis
A Sokolov
AG Murzin
AR Atilgan
AT Laurie
BJ Grant
CA Earhart
CB Ahlers
CJO Baker
CM Nunn
CT Porter
D Ferrucci
D Ming
D Ming
D Ming
D Ming
D Ming
D Oliver
D Zhou
DS Greer
F Horn
F Leitner
GL Card
HJ Nam
HM Berman
I Bahar
J Dundas
J Laurila
J Ory
JD Cohn
JG Caporaso
JG Caporaso
JK Hurley
JM Jez
Judith D. Cohn
JY Choe
K Hinsen
K Nagel
K Nagel
K Nagel
K Verspoor
K Verspoor
K Verspoor
K Verspoor
Karin M. Verspoor
KB Cohen
KE Ravikumar
KL Damm
Komandur E. Ravikumar
L Hu
L Xie
LH Weaver
LJ Jensen
LL Huang
M Ankerst
M Krallinger
M Krallinger
ME Wall
MF Sanner
Michael E. Wall
ML Benson
ML Benson
MM Tirion
N Chim
Neil R. Smalheiser
PE Bourne
R Gaizauskas
R Witte
RC Edgar
S Perot
TW Schwartz
WA Baumgartner Jr
Publication venue: Public Library of Science
Publication date: 29/02/2012
Field of study

We present an approach that integrates protein structure analysis and text mining for protein functional site prediction, called LEAP-FS (Literature Enhanced Automated Prediction of Functional Sites). The structure analysis was carried out using Dynamics Perturbation Analysis (DPA), which predicts functional sites at control points where interactions greatly perturb protein vibrations. The text mining extracts mentions of residues in the literature, and predicts that residues mentioned are functionally important. We assessed the significance of each of these methods by analyzing their performance in finding known functional sites (specifically, small-molecule binding sites and catalytic sites) in about 100,000 publicly available protein structures. The DPA predictions recapitulated many of the functional site annotations and preferentially recovered binding sites annotated as biologically relevant vs. those annotated as potentially spurious. The text-based predictions were also substantially supported by the functional site annotations: compared to other residues, residues mentioned in text were roughly six times more likely to be found in a functional site. The overlap of predictions with annotations improved when the text-based and structure-based methods agreed. Our analysis also yielded new high-quality predictions of many functional site residues that were not catalogued in the curated data sources we inspected. We conclude that both DPA and text mining independently provide valuable high-throughput protein functional site predictions, and that integrating the two methods using LEAP-FS further improves the quality of these predictions

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

University of Melbourne Institutional Repository

Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies

Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%–63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with “overprediction” of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Ten Years of Pathway Analysis: Current Approaches and Outstanding Challenges

Author: A Bauer-Mehren
A Boorsma
A Brazma
A Shojaie
A Subramanian
AG Gilman
AL Tarca
AM Huerta
Atul J. Butte
B Efron
B Mann
B Zeeberg
B Zeeberg
B Zhang
C Backes
C Henegar
C Perez-Iratxeta
Christos A. Ouzounis
CI Castillo-Davis
D Maglott
D Martin
D Pennica
DW Huang
EI Boyle
ET Wang
F Al-Shahrour
F Al-Shahrour
G Bindea
G Glazko
G Joshi-Tope
GF Berriz
H Sun
H Xiong
I Dinu
J Iqbal
J Li
J Rahnenführer
J Ye
JJ Goeman
JJ Goeman
KC Li
KD Dahlquist
L Beltrame
L Klebanov
L Tian
La Martinez-Cruz
M Ackermann
M Brannon
M Haertel-Wiesmann
M Hummel
M Kanehisa
Marina Sirota
MD Robinson
ML Green
OH Tam
P Khatri
P Khatri
P Khatri
P Pavlidis
P Pavlidis
PD Karp
PD Thomas
PK Majumder
Purvesh Khatri
Q Zheng
R Braun
R Breitling
R Chen
R Edgar
RE Dolmetsch
S Chiaretti
S Draghici
S Draghici
S Drăghici
SB Kim
SE Calvano
SW Doniger
SW Kong
SY Kim
SY Rhee
T Beissbarth
T Breslin
TR Golub
U Mansmann
VK Mootha
WA Baumgartner Jr
WT Barry
Y Lu
Ya Grigoryev
Z Du
Z Jiang
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Pathway analysis has become the first choice for gaining insight into the underlying biology of differentially expressed genes and proteins, as it reduces complexity and has increased explanatory power. We discuss the evolution of knowledge base–driven pathway analysis over its first decade, distinctly divided into three generations. We also discuss the limitations that are specific to each generation, and how they are addressed by successive generations of methods. We identify a number of annotation challenges that must be addressed to enable development of the next generation of pathway analysis methods. Furthermore, we identify a number of methodological challenges that the next generation of methods must tackle to take advantage of the technological advances in genomics and proteomics in order to improve specificity, sensitivity, and relevance of pathway analysis

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

FigShare

Lysosomes in iron metabolism, ageing and apoptosis

Author: A Barbouti
A Brun
A Brun
A Milito de
A Simonsen
A Terman
A Terman
A Terman
A Terman
A Terman
A Terman
A Terman
A Terman
A Terman
A Terman
A Viarengo
AJ Levine
Alexei Terman
AM Cuervo
AM Cuervo
AM Cuervo
AM Koorts
AS Zhang
B Garner
B Garner
B Gerstbrein
B Turk
B Vaisman
BC Victor
Bertil Gustafsson
BH Yeung
C Duve de
C Duve de
C Duve de
C Settembre
C Settembre
CA Rupar
CA Seymour
D Berg
D Harman
D Michalovitz
D Tang
DA Wenger
DC Radisky
DJ Klionsky
DR Richardson
E Bergamini
E Nilsson
ER Stadtman
F Antunes
FN Ghadially
FQ Schafer
FY Lee
G Barja
G Kroemer
GK Gouras
GL Semenza
H Glaumann
H Miyajima
H Miyawaki
H Rochefort
H Takagi
HH Ku
HK Baumgartner
HL Persson
HL Persson
I Domenico de
I Podgorski
I Sakaida
I Schraufstätter
J DeGroot
J Hardy
J Nylandsted
JC Sipe
JH Walton
JM Zdolsek
JP Luzio
JW Eaton
K Kiselyov
K Kågedal
K Lorenzo
K Suzuki
L Zheng
M Elleder
M Fontenay
M Heinrich
M Kruszewski
M Navratil
M Pandolfo
M Tenopoulou
M Zhao
MC Bennett
MC Maiuri
ME Guicciardi
N Bidere
N Lane
NW Werneburg
P Arosio
P Boya
P Fattoretti
P Montcourrier
PT Doulias
R Autelli
R Bartrons
R Castino
R Martinez-Zaguilan
RD Jolly
RL Pisoni
RS Frey
S Kaushik
S Ohkuma
S Roberts
SE Nilsson
SI Rattan
SK Baird
ST Lee
T Cirman
T Kurz
T Kurz
T Kurz
T Shintani
T Yorimitsu
Tino Kurz
TZ Kidane
U Brunk
U Brunk
Ulf T. Brunk
UT Brunk
UT Brunk
UT Brunk
UT Brunk
UT Brunk
UT Brunk
W Chen
W Li
W Li
W Li
WA Dunn Jr
WH Yu
X Liu
XM Yuan
Y Cao
Z Yu
Z Yu
Publication venue: Springer-Verlag
Publication date: 01/01/2008
Field of study

The lysosomal compartment is essential for a variety of cellular functions, including the normal turnover of most long-lived proteins and all organelles. The compartment consists of numerous acidic vesicles (pH ∼4 to 5) that constantly fuse and divide. It receives a large number of hydrolases (∼50) from the trans-Golgi network, and substrates from both the cells’ outside (heterophagy) and inside (autophagy). Many macromolecules contain iron that gives rise to an iron-rich environment in lysosomes that recently have degraded such macromolecules. Iron-rich lysosomes are sensitive to oxidative stress, while ‘resting’ lysosomes, which have not recently participated in autophagic events, are not. The magnitude of oxidative stress determines the degree of lysosomal destabilization and, consequently, whether arrested growth, reparative autophagy, apoptosis, or necrosis will follow. Heterophagy is the first step in the process by which immunocompetent cells modify antigens and produce antibodies, while exocytosis of lysosomal enzymes may promote tumor invasion, angiogenesis, and metastasis. Apart from being an essential turnover process, autophagy is also a mechanism by which cells will be able to sustain temporary starvation and rid themselves of intracellular organisms that have invaded, although some pathogens have evolved mechanisms to prevent their destruction. Mutated lysosomal enzymes are the underlying cause of a number of lysosomal storage diseases involving the accumulation of materials that would be the substrate for the corresponding hydrolases, were they not defective. The normal, low-level diffusion of hydrogen peroxide into iron-rich lysosomes causes the slow formation of lipofuscin in long-lived postmitotic cells, where it occupies a substantial part of the lysosomal compartment at the end of the life span. This seems to result in the diversion of newly produced lysosomal enzymes away from autophagosomes, leading to the accumulation of malfunctioning mitochondria and proteins with consequent cellular dysfunction. If autophagy were a perfect turnover process, postmitotic ageing and several age-related neurodegenerative diseases would, perhaps, not take place

Publikationer från Linköpings universitet

Crossref

Springer - Publisher Connector

PubMed Central

Digitala Vetenskapliga Arkivet - Academic Archive On-line